WIRE: an Open Source Web Information Retrieval Environment
نویسندگان
چکیده
In this paper, we describe the WIRE (Web Information Retrieval Environment) project and focus on some details of its crawler component. The WIRE crawler is a scalable, highly configurable, high performance, open-source Web crawler which we have used to study the characteristics of large Web collections.
منابع مشابه
بازیابی اطلاعات تصویری حوزهی سلامت در وب از دیدگاه متخصصان علوم پزشکی:یک مطالعه کیفی
Introduction: The medical image as a source of non-textual information has an important role in the field of medicine. Since the quality of life is directly related to health, employing this type of information is effective in improving the practice of health professionals. This study was aimed to survey medical image retrieval in the Web from the perspective of experts in medical sciences. M...
متن کاملSome Approaches to Text Mining and Their Potential for Semantic Web Applications
In this paper we describe some approaches to text mining, which are supported by an original software system developed in Java for support of information retrieval and text mining (JBowl), as well as its possible use in a distributed environment. The system JBowl1 is being developed as an open source software with the intention to provide an easily extensible, modular framework for pre-processi...
متن کاملCarrot2: Design of a Flexible and Efficient Web Information Retrieval Framework
In this paper we present the design goals and implementation outline of Carrot, an open source framework for rapid development of applications dealing with Web Information Retrieval and Web Mining. The framework has been written from scratch keeping in mind flexibility and efficiency of processing. We show two software architectures that meet the requirements of these two aspects and provide ev...
متن کاملBehavioral Considerations in Developing Web Information Systems: User-centered Design Agenda
The current paper explores designing a web information retrieval system regarding the searching behavior of users in real and everyday life. Designing an information system that is closely linked to human behavior is equally important for providers and the end users. From an Information Science point of view, four approaches in designing information retrieval systems were identified as system-...
متن کاملPre-processing text for web information retrieval purposes by splitting compounds into their morphemes
In web information retrieval, the interpretation of text is crucial. In this paper, we describe an approach to ease the interpretation of compound word (i.e. words that consist of other words such as “handshake” or “blackboard”). We argue that in the web information retrieval domain, a fast decomposition of those words is necessary and a way to split as many words as possible, while we believe ...
متن کامل